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Kevlar:  Transitioning  Helix  from  Researeh  to  Praetiee 

1.0  SUMMARY 

Security  weaknesses  in  DoD  (Department  of  Defense)  information  systems  remain  a  major 
challenge  for  system  stakeholders.  We  have  advanced  the  transition  of  technology  developed 
under  the  Helix  and  PEASOUP  (Preventing  Exploits  Against  Software  of  Uncertain  Provenance  ) 
projects  to  protect  Air  Eorce  systems  of  interests.  The  results  are  expected  to  be  an  asset  that,  if 
widely  deployed  by  the  DoD,  would  enable  a  high  level  of  confidence  in  the  security  of  DoD 
systems,  in  particular,  confidence  that  certain  classes  of  critical  vulnerabilities  were  no  longer 
subject  to  possible  exploitation. 

Weaknesses  in  software  code  (such  as  memory  overwriting  errors,  fixed-width  integer 
computation  errors,  input  validation  oversights,  and  format  string  vulnerabilities)  remain 
common.  Exploiting  these  weaknesses,  attackers  are  able  to  hijack  an  application’s  intended 
control  flow  to  violate  security  policies  (exfiltrating  secret  data,  allowing  remote  access, 
bypassing  authentication,  or  eliminating  services).  To  mitigate  and  defend  against  attacks  that 
seek  to  exploit  such  weaknesses,  we  have  developed  the  Helix  architecture.  Helix  represents  the 
culmination  of  over  10  years  of  R&D  with  support  from  Defense  Advanced  Research  Projects 
Agency  (DARPA),  the  National  Science  Eoundation  (NSE),  the  Army  and  the  Air  Eorce,  and 
ongoing  support  from  the  Intelligence  Advanced  Research  Projects  Agency  (lARPA). 

We  have  leveraged  the  opportunity  to  take  the  Helix  architecture  one  step  closer  to 
deployment  in  real  systems  by  enhancing  it  to  be  a  completely  automatic  system  for  securing 
applications  against  attack  by  well-funded,  determined  malicious  adversaries.  Helix/Kevlar 
armors  binary  programs  and  protects  them  from  attacks,  which  could  arise  from  the  inevitable 
vulnerabilities  that  remain  after  deployment.  Source  code  of  the  application  to  be  protected  is  not 
required  nor  is  any  other  development  artifacts.  These  features  make  Helix/Kevlar  of  particular 
value  for  software  systems  that  have  to  be  used  but  for  which  no  development  information  is 
available,  or  for  which  significant  portions  of  the  system  include  handwritten  assembly  code  and 
special-purpose  libraries. 

The  key  security  technologies  used  by  Helix/Kevlar  are  protective  transformations  and 
targeted  recovery.  The  protective  transformations  are  applied  to  application  binaries  before  they 
are  deployed.  Conceptually,  these  transformations  are  tailor-made,  lightweight  “armor”  that 
prevent  an  attacker  from  exploiting  residual  vulnerabilities  in  a  wide  variety  of  classes. 
Helix/Kevlar  uses  novel,  fine-grained,  high-entropy  diversification  transformations  to  prevent  an 
attacker  from  successfully  exploiting  vulnerabilities.  To  prevent  attacks  from  causing  the  system 
to  act  in  undesirable  ways,  such  as  crashing  or  performing  unintended  actions,  Helix/Kevlar  also 
provides  custom-made,  application-specific  remediation  strategies  that  may  be  invoked  in  the 
event  of  an  attack. 

An  important  development  for  the  project  was  the  integration  of  our  static  binary  rewriting 
technology  into  Helix/Kevlar.  With  the  addition  of  the  binary  rewriting  technology  (called  Zipr) 
into  Helix/Kevlar,  we  can  instantiate  protections  statically  using  Zipr  or  dynamically  using 
Strata.  Zipr  is  appropriate  for  resource-constrained  devices  (e.g.  embedded  systems,  Internet  of 
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Things)  and  applications  that  include  virtual  machines  such  as  Java  and  Javascript,  while  Strata  is 
appropriate  for  systems  where  a  moving  target  defense  is  appropriate. 

Helix/Kevlar  has  several  major  strengths:  (a)  it  is  applied  to  binaries  and  does  not  depend  on 
partieular  languages,  compilers,  or  libraries,  (b)  it  is  eomplementary  to  other  seeurity  teehniques 
including  inspection,  static  analysis  and  testing,  (c)  it  requires  no  ehanges  to  the  software 
development  process,  and  (d)  preliminary  performance  measurements  show  that  the  armoring 
provided  by  Helix/Kevlar  is  lightweight  incurring  modest  run-time  performance  overhead  of 
around  10%  when  dynamie  translation  is  used,  and  less  than  5%  when  static  rewriting  is  used. 

Another  notable  achievement  of  the  projeet  was  that  we  retargeted  Strata  to  Windows  (64- 
bit).  With  this  addition,  Helix/Kevlar  ean  now  be  applied  to  Windows  applications. 

In  summary,  major  accomplishments  of  the  projeet  include: 

•  Retargeted  Strata,  Helix/Kevlar’s  dynamic  translator,  to  64-bit  Windows, 

•  Integrated  Zipr,  an  efficient  statie  binary  rewriter  into  Helix/Kevlar, 

•  File  a  U.S.  Patent  on  the  Zipr  technology  (Title:  System,  Method  and  Computer  Readable 
Medium  for  Space-Efficient  Binary  Rewriting), 

•  Demonstrated  the  ability  to  apply  Helix/Kevlar  transforms  to  the  core  libraries  of  the  Java 
VM, 

•  Collaborated  with  Northrop  Grumman  to  do  an  evaluation  of  the  effectiveness  of 
Helix/Kevlar  in  protecting  applications, 

•  Accepted  paper  at  the  10th  lET  System  Safety  and  Cyber-Security  Conference  deseribing 
Helix/Kevlar’s  protection  of  binary  programs, 

•  Accepted  paper  at  the  45th  Annual  lEEE/IEIP  International  Conferenee  on  Dependable 
Systems  and  Networks  deseribing  a  new  taint  inference  technique  for  defeating  web 
application  attacks,  and 

•  Accepted  paper  at  the  11th  Annual  Cyber  and  Information  Seeurity  Researeh  Conference 
describing  how  Helix/Kevlar  can  be  used  to  defeat  blind  ROP  attacks. 

2,0  INTRODUCTION 

Security  weaknesses  in  DoD  information  systems  remain  a  major  challenge  for  system 
stakeholders.  To  mitigate  and  defend  against  attacks  that  seek  to  exploit  such  weaknesses,  we 
have  developed  the  Helix  arehiteeture.  Helix  represents  the  eulmination  of  over  10  years  of 
Researeh  and  Development  (with  support  from  DARPA,  the  National  Science  Eoundation,  the 
Army  and  the  Air  Eorce,  and  lARPA).  Salient  features  of  Helix/Kevlar  include  developing  high- 
entropy  randomization  techniques,  automated  program  repairs,  leveraging  highly-optimized 
virtual  machine  teehnology,  and  in  general,  developing  a  novel  framework  for  program  analysis, 
transformation  and  composition.  We  propose  to  transition  teehnology  developed  under  the  Helix 
and  PEASOUP  projects  to  protect  Air  Eorce  systems  of  interests.  We  expect  the  result  to  be  an 
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asset  that,  if  widely  deployed  by  the  DoD,  would  enable  a  high  level  of  eonfidenee  in  the  security 
of  DoD  systems,  in  particular,  confidence  that  certain  classes  of  critical  vulnerabilities  were  no 
longer  subject  to  possible  exploitation. 

The  next  two  sections  describe  the  Helix/Kevlar  architecture  and  our  plans  to  transition  this 
technology  so  that  it  can  be  used  to  protect  current  and  future  Air  Force  systems.  The  major 
component  of  this  effort  is  to  develop  Helix/Kevlar,  a  robust  easy  to  use  tool  for  applying  the 
Helix  technology  to  real  systems. 


3.0  METHODS,  ASSUMPTIONS,  AND  PROCEDURES 
3.1  Helix 

A  fundamental  problem  with  current  defenses  is  that  they  do  not  redress  the  asymmetry  between 
attackers  and  defenders,  changing  the  target  system  only  slowly  and  reactively  in  response  to 
attacks.  Even  approaches  that  incorporate  intrusion  detection  and  tolerance  have  proven 
ineffective  against  determined  and  well-funded  attackers  who  have  at  their  disposal  a  growing 
arsenal  of  evasive,  stealthy,  adaptive,  polymorphic  and  metamorphic  attacks.  To  cope  with  such 
sophisticated  attacks,  the  Helix  architecture  uses  a  combination  of  defense  mechanisms  that  is 
both  highly  effective  and  metamorphic,  i.e.,  a  high-entropy  metamorphic  shield,  that  presents 
attackers  with  a  continuously  changing  attack  surface. 

Figure  1  provides  a  high-level  conceptual  overview  of  the  Helix  architecture.  An  application 
running  in  Helix  is  treated  in  a  holistic  way,  with  information  being  shared  across  development, 
deployment,  execution,  and  response  phases  in  ways  that  are  not  possible  with  traditional 
architectures. 

Instead  of  viewing  the  standard  tool  chain  as  just  a  series  of  steps  to  transform  an  application 
from  source  code  to  executable  form,  we  take  a  more  comprehensive  view  in  which  program 
metadata  can  be  deposited  in  the  Intermediate  Representation  Database  (IRDB),  and 
subsequently  manipulated  and  enhanced  at  all  phases  of  a  program’s  lifecycle,  to  enable  the 
development  of  novel  and  accurate  security  protection  algorithms.  Starting  with  applications  in 
source  or  binary  form  as  input.  Helix  proactively  analyzes  and  transforms  applications  to 
augment  them  with  self-sensing  and  self-protection  capabilities.  Helix  enables  innate  and 
adaptive  actions  in  response  or  in  anticipation  to  attacks  by  running  applications  under  control  of 
Strata,  a  lightweight  virtual  machine  known  as  a  software  dynamic  translator  (SDT).  Strata 
provides  the  ability  to  rewrite  application  code  on-demand  for  deploying  security  protections 
and/or  dynamically  shifting  the  attack  surface  of  applications. 
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Figure  1 :  High-level  conceptual  overview  of  the  Helix  architecture 


Instead  of  viewing  the  standard  tool  ehain  as  just  a  series  of  steps  to  transform  an  application 
from  source  code  to  executable  form,  we  take  a  more  comprehensive  view  in  which  program 
metadata  can  be  deposited  in  the  Intermediate  Representation  Database  (IRDB),  and 
subsequently  manipulated  and  enhanced  at  all  phases  of  a  program’s  lifecycle,  to  enable  the 
development  of  novel  and  accurate  security  protection  algorithms.  Starting  with  applications  in 
source  or  binary  form  as  input.  Helix  proactively  analyzes  and  transforms  applications  to 
augment  them  with  self-sensing  and  self-protection  capabilities.  Helix  enables  innate  and 
adaptive  actions  in  response  or  in  anticipation  to  attacks  by  running  applications  under  control  of 
Strata,  a  lightweight  virtual  machine  known  as  a  software  dynamic  translator  (SDT).  Strata 
provides  the  ability  to  rewrite  application  code  on-demand  for  deploying  security  protections 
and/or  dynamically  shifting  the  attack  surface  of  applications. 

Helix  is  a  multi-faceted  research  vision  that  has  lead  to  many  key  results.  However,  as  a 
research  project,  some  ideas  are  more  suited  to  current,  real-world  use  than  others.  To  transition 
the  best,  most  deployable  of  these  ideas  to  practice,  from  Technical  Readiness  Level  5  (TRL-5: 
testing  of  integrated  technology  components  in  representative  environment)  to  Technical 
Readiness  Level  6  (TRL-6;  prototype  implementation  on  full-scale  realistic  systems),  we 
introduce  Helix/Kevlar.  Kevlar  directly  leverages  the  following  capabilities  from  the  Helix 
project: 

•  The  concept  of  analyzing  and  storing  meta  information  regarding  software  in  the 

Intermediate  Representation  Database  is  key  to  enabling  various  security  transformations. 
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•  High  precision  static  and  binary  analysis  of  binaries. 

•  Helix  incorporates  several  novel  high-entropy  randomization  techniques. 

•  Helix  significantly  advanced  the  use  of  fast  dynamic  binary  rewriting  techniques  for 
armoring  binaries  without  requiring  the  availability  of  source  code. 

•  Helix  leverages  the  Strata  virtual  machine  technology  for  transparently  augmenting 
binaries  with  self-sensing,  self-diversification,  self-protection  and  self-repair  capabilities. 

Overall,  Helix  provides  the  intellectual  framework  for  quickly  developing  and  fielding  new 
security  transformations.  The  next  section  describes  how  Helix/Kevlar  will  exploit  Helix- 
developed  capabilities  in  order  to  begin  an  effective  transition  from  research  to  practice. 

3,2  Helix/Kevlar  Architecture 

Helix/Kevlar  is  a  completely  automatic  system  for  securing  applications  against  attack  by  well- 
funded,  determined  malicious  adversaries.  It  armors  binary  programs  and  protects  them  from 
attacks  which  could  arise  from  the  inevitable  vulnerabilities  that  remain  after  deployment.  The 
source  code  is  not  required  nor  are  any  other  development  artifacts  (such  as  object  fdes, 
debugging  information,  linker  maps,  etc.)  Enabling  the  rapid  development  of  security 
transformations  and  enabling  their  safe  composition  are  hallmarks  of  the  Helix/Kevlar  system. 
Helix/Kevlar  consists  of  two  phases:  (1)  an  offline  phase  in  which  Helix/Kevlar  performs  deep 
analyses  on  binaries  and  records  results  in  an  intermediate  representation  database  (called  the 
IRDB).  Helix/Kevlar  then  uses  the  database  to  generate  and  vet  sprockets,  i.e.  specifications  for 
security  transformations;  and  (2),  an  online  phase  in  which  these  sprocket  specifications  are 
applied.  They  can  be  applied  using  Strata,  a  state-of-the-art  dynamic  binary  rewriter  [25,  34].  In 
addition,  in  this  project,  we  developed  Zipr,  a  highly  efficient  static  binary  rewriter  [8].  Zipr 
provides  the  ability  to  apply  Helix/Kevlar  protections  to  systems  that  use  selfmodifying  code 
(e.g.,  just-in-time  compilers  such  as  Java). 

Figure  2  shows  the  high-level  architecture  of  the  off-line  or  redeployment  portion  of 
Helix/Kevlar.  Helix/Kevlar  consists  of  a  static  analyzer,  called  STARS  [10],  that  disassembles 
x86  binaries,  performs  extensive  static  analysis  of  the  binary,  and  then  stores  the  results  of  the 
analysis  along  with  the  binary  persistent  in  the  IRDB. 
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Vetted  Sprockets 


Figure  2:  Helix/Kevlar  Architecture:  Offline  generation  of  Sprockets  programs, 

A  Helix/Kevlar  transformation  phase  uses  information  in  the  IRDB  to  ereate  new  versions  of 
a  binary,  ealled  variants,  where  various  armoring  transformations  and  remediation  polieies  have 
been  applied.  A  novel  aspect  of  Helix/Kevlar  is  that,  rather  than  statically  rewrite  the  binary, 
Helix/Kevlar  produces  programs,  called  Sprockets,  that  are  used  by  the  Helix/Kevlar  rewriters  to 
transform  the  original  binary  into  the  corresponding  variant  at  run  time. 

To  ensure  that  the  variants  produced  by  Helix/Kevlar  run  appropriately,  they  are  then 
“vetted”  by  a  tool  called  BED  (Behavior  Equivalence  Detection)  and  TSET  (Test  Suite 
Evaluation  Technology).  BED  runs  each  variant  using  a  regression  test  suite  to  ensure  that  the 
variant  produces  the  same  output  as  the  original  binary  while  TSET  seeks  to  measure  confidence 
levels  of  the  results  reported  by  BED.  In  addition,  BED  uses  a  fault  injector  to  inject  faults  into 
the  application  to  determine  the  effectiveness  of  the  Helix/Kevlar  remediation  policies. 


Figure  3:  Helix/Kevlar  Architecture:  Online  Selection  of  Sprocket  programs, 
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Figure  3  shows  a  deployed  binary  that  is  proteeted  by  Helix/Kevlar.  In  this  diagram,  the 
vetted  Sproeket  programs  are  applied  to  the  binary  by  the  software  dynamie  translator,  Strata. 
Helix/Kevlar  has  the  ability  to  dynamieally  seleet  from  the  set  of  Sproeket  programs  to  effeet 
temporal  ehange  in  the  proteetions  that  are  applied.  Sueh  ehanges  eould  be  triggered  periodieally 
or  beeause  an  attaek  has  been  sensed  and  remediated  and  it  is  desired  to  add  additional 
proteetions  or  to  apply  different  remediation  polieies. 

We  highlight  several  major  analysis  and  seeurity  transformations  supported  by  Helix/Kevlar. 
Eaeh  transformation  ean  be  used  in  isolation,  or  eomposed  with  other  transformations  for  added 
proteetion. 

3.2.1  Intermediate  Representation  Database  (IRDB) 

To  faeilitate  multiple  simultaneous  transformations  to  a  program,  Helix/Kevlar  uses  an 
intermediate  representation  (IR)  held  in  a  database,  whieh  we  term  the  IR  database  (IRDB). 

The  IRDB  is  similar  to  the  IR  for  a  traditional  eompiler.  It  eontains  information  about  the 
program,  sueh  as  the  instruetions  whieh  make  up  the  program,  their  addresses,  their  eontrol  and 
data  flow,  ete.  Furthermore,  it  eontains  information  about  eaeh  funetion  ineluding  the  staek 
layout,  entry  points,  exit  points,  ete.,  the  global  data  layout  of  the  program,  targets  of  indireet 
branehes,  ete.  Program  information  is  added  to  the  IRDB  by  various  tools  ineluding  a  binary 
statie  analyzer  ealled  STARS  that  is  diseussed  in  the  next  seetion  (Seetion ). 

Unlike  a  traditional  IR  for  a  eompiler,  the  IR  in  the  database  is  not  guaranteed  to  be  100% 
aeeurate.  We  realistieally  assume  that  information  sueh  as  perfeetly  aeeurate  disassembly  of  the 
program  is  not  available.  We  make  this  assumption  to  faeilitate  binary  analysis  and 
transformation  where  sueh  information  is  rarely  available.  Issues  of  imperfeet  analysis  ean  be 
eompounded  if  different  analyses  disagree  on  information.  For  example,  we  use  both  STARS  and 
the  Linux  utility  ob  j  dump  to  populate  the  list  of  instruetions  in  the  IRDB.  The  two  tools 
typieally  agree  on  instruetion  start  addresses,  but  oeeasionally  they  disagree.  The  IRDB 
faeilitates  the  use  of  eonflioting  information  by  supporting  eonflieting  information  with  a 
eonfidenee  metrie. 

Another  key  feature  of  the  IRDB  is  the  ability  to  “elone”  a  program.  A  eloned  program  is 
identieal  to  the  program  it  was  eloned  from,  exeept  with  a  new  name,  and  extra  information  to 
traek  the  ereation  of  elones.  The  primary  purpose  of  a  elone  is  to  faeilitate  programmatie 
experimentation.  For  example,  suppose  that  we  wish  to  determine  whieh  remediation  teehnique 
would  be  effeetive  for  a  given  program.  We  might  ehoose  to  elone  the  program,  then  instrument 
the  program  with  the  remediation  technique  for  testing. 

The  clone  feature  has  one  other  primary  purpose:  namely  we  need  a  “before”  and  “after” 
version  of  the  program  to  support  automatic  generation  of  the  Sprockets  needed  to  execute  the 
modified  program.  By  tracing  a  cloned  program’s  hierarchy  back  to  the  untransformed  program, 
we  can  successfully  generate  Sprockets  to  represent  the  changes  between  the  original  program 
and  the  transformed  program.  By  examining  these  differences,  automatic  generation  of  the 
Sprockets  is  fast,  efficient  and  reliable. 
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Helix/Kevlar’s  IRDB  is  implemented  using  PostgreSQL.  Measurements  show  that  it  is  fast 
and  effieient. 

3,2,2  Static  Analysis  for  Reliability  and  Security  (STARS) 

A  key  eomponent  of  Helix/Kevlar  is  STARS  (STatie  Analyzer  for  Reliability  and  Seeurity).  It 
was  developed  to  determine  eertain  seeurity  properties  of  a  binary  program  [10].  STARS  is 
implemented  as  a  plug-in  to  the  popular  IDA  Pro  disassembler  [9].  The  statie  analyzer  eurrently 
operates  on  Linux/x86  binaries,  although  it  ean  be  targeted  to  any  platform  that  is  targeted  by 
IDA  Pro.  Currently,  IDA  Pro  targets  more  than  40  proeessors  and  operating  platforms. 

A  key  funetion  of  STARS  in  Helix/Kevlar  is  the  identifieation  of  the  instruetions  of  the 
applieation.  As  diseussed  by  Debray  and  Andrews,  preeisely  disassembling  a  binary  is,  in 
general,  not  a  solvable  problem  [24].  In  praetiee,  STARS  rarely  misidentifies  data  as  eode.  As 
noted  by  Debray  and  Andrews,  sueh  misidentifieations  would  be  disastrous  for  a  statie  eode 
rewriter.  Beeause  Helix/Kevlar  uses  software  dynamie  translation  to  transform  eode, 
Helix/Kevlar  is  able  to  tolerate  any  inaeeuraeies — we  will  never  rewrite  data  as  eode  as  the 
rewrite  proeess  oeeurs  during  the  feteh/exeeute/translate  phase  of  the  dynamie  translator.  That  is, 
only  eode  that  should  be  exeeuted  is  proeessed. 

The  statie  analyzer  analyzes  the  eontrol  flow  and  data  flow  of  the  entire  program  binary.  The 
analyzer  builds  a  fully  pruned  SSA  (Statie  Single  Assignment)  form  representation  of  the 
program  and  performs  numerous  data  flow  analyses  on  this  representation  [4,31].  The  data  flow 
analyses  inelude  a  simplified  type  system,  in  whieh  registers  and  staek  loeations  are  typed  as 
being  data  pointers,  integers,  floating-point  values,  strings,  or  eode  pointers. 

All  information  determined  by  STARS  is  reeorded  in  the  IRDB.  Later  analysis  and 
transformation  phases  of  Helix/Kevlar  use  this  information.  During  these  phases,  as  additional  or 
more  aeeurate  information  beeomes  known,  information  in  the  IRDB  is  updated. 


3,2,3  Sprocket  Execution  Engines 

In  Helix/Kevlar,  transformations  to  the  binary  are  either  applied  statieally  or  dynamieally.  Eaeh 
approaeh  has  advantages.  Statie  rewriting  typieally  has  a  smaller  memory  footprint  and  lower 
run-time  overhead.  Dynamie  rewriting  supports  a  moving  target  defense  by  eonstantly 
transforming  the  applieation.  It  also  permits  a  single  binary  to  be  deployed  with  transformations 
applied  dynamieally  to  ereate  an  ever-ehanging  attaek  surfaee — the  metamorphie  shield.  While 
the  overhead  of  dynamie  rewriting  is  reasonable,  for  resouree-eonstrained  deviees,  statie 
rewriting  may  be  preferred.  The  rewrites  to  be  applied  are  speeified  by  sproeket  program 
expressed  using  the  Sproeket  Program  Rewriting  Interfaee  (SPRI). 


SPRI  and  Sprocket  Generation 

SPRI  defines  simple  rewriting  rules  that  eome  in  two  forms.  The  first  form,  the  redireet  form, 
transfers  eontrol  to  a  speeified  target  address  (lines  1  and  3  in  Figure  4).  The  seeond  form,  the 
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instruction  definition  form,  indicates  that  there  is  an  instruction  at  a  particular  location  (line  2). 
The  net  effect  of  applying  the  SPRI  rules  shown  in  Figure  4  is  to  rewrite  the  instruction  sub 
e s p ,  2  0  instruction  at  address  0x8000  to  be  sub  esp,  40.  The  stack  layout  transformation 
(described  in  Seetion  3.3.2)  uses  sueh  rules  to  transform  staek  frame  alloeations. 

Together  these  two  types  of  rules  provide  the  foundation  for  building  a  wide  range  of 
Sproeket  programs.  The  example  shown  illustrates  the  equivalent  of  a  small  patch  that  modifies 
only  1  instruetion.  At  the  other  end  of  the  seale,  transformations  such  as  ILX  (instruction  location 
transformation,  described  in  Seetion  3.3.1)  seek  to  rewrite  all  instruetions  in  a  binary. 


Original  Program  Fragment: 


(a)  0x8000 

sub  esp, 20 

Rewrite  rule: 

(1)  0x8000 

-> 

OxFFOO 

(2)  OxFFFO 

*  * 

sub  esp, 40 

(3)  OxFFOl 

-> 

0x8001 

Figure  4:  Sprocket  rewrite  rule  to  change  stack  frame  allocation.  For  exposition  purposes, 

all  instructions  are  1-byte  long. 

Despite  its  eoneeptual  simplicity,  manually  writing  Sprockets  in  SPRI  would  be  a  tedious 
and  error-prone  proeess.  Instead,  Sproeket  developers  apply  their  transformations  using  a  high- 
level  C/C++  API  (applieation  program  interfaee)  to  manage  the  ereation  and  deletion  of  program 
variants,  and  to  manipulate  program  state,  e.g.  to  insert,  delete,  or  replaee  instruetions  and  re¬ 
route  eontrol  flow.  The  API  transparently  interacts  with  the  IRDB  to  eommit  any  ehanges. 

With  this  arehitecture,  the  eomposition  of  Sproekets  is  naturally  performed  by  ehaining 
together  transformations:  one  Sproeket  eneodes  its  transformation  in  the  IRDB,  the  next  Sproeket 
then  takes  as  input  the  new  database  state,  and  then  effects  its  own  transformations.  Helix/Kevlar 
then  automatieally  generates  SPRI  rules  for  any  program  variants  by  essentially  performing  a 
“smart  diff  ’  between  the  IRDB  representations  of  a  variant  and  the  original  binary. 

Onee  the  SPRI  rules  are  generated,  Helix/Kevlar  can  use  either  Strata  or  Zipr  to  instantiate 
the  transformations. 


Strata 

Figure  3  shows  the  dynamie  translation  of  a  binary  using  Strata  to  apply  the  transformations 
speeified  by  the  Sproekets.  While  we  use  Strata  as  our  underlying  SDT  infrastrueture,  we  note 
that  Sproekets  eould  be  similarly  implemented  via  any  SDT  tool  [3,  17,  19,  25,  26,  29]. 
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Strata  dynamically  loads  an  application  and  mediates  applieation  exeeution  by  examining 
and  translating  an  applieation’s  instruetions  before  they  exeeute  on  the  host  CPU.  Strata  operates 
as  a  eo-routine  with  the  applieation  that  it  is  proteeting.  Translated  applieation  instruetions  are 
held  in  a  managed  eaehe  called  a  fragment  eaehe.  Strata  is  first  entered  by  capturing  and  saving 
the  applieation  eontext  (e.g.,  program  eounter  (PC),  eondition  eodes,  registers,  ete.)  Following 
eontext  eapture.  Strata  proeesses  the  next  applieation  instruetion.  If  a  translation  for  this 
instruetion  has  been  previously  eaehed.  Strata  transfers  eontrol  to  the  eaehed  translated 
instruetions. 

If  there  is  no  eaehed  translation  for  the  next  applieation  instruetion.  Strata  alloeates  storage 
in  the  fragment  eaehe  for  a  new  fragment  of  translated  instruetions.  Strata  then  populates  the 
fragment  by  fetehing,  deeoding,  and  translating  applieation  instruetions  one-by-one  until  an  end- 
of-fragment  eondition  is  met  (e.g.,  an  indireet  braneh).  As  the  applieation  exeeutes  under  Strata’s 
eontrol,  more  and  more  of  the  applieation’s  working  set  of  instruetions  materialize  in  the 
fragment  eaehe. 

Implementation  of  Sproekets  requires  several  simple  extensions  to  a  typieal  software 
dynamie  translator.  First,  we  must  modify  Strata  startup  eode  to  read  the  SPRI  rewrite  rules  (not 
pietured).  Next,  Strata’s  instruetion  fetehing  meehanism  is  overridden  to  first  eheek,  then  read 
from  SPRI  rewrite  rules  as  appropriate.  Lastly,  the  next-PC  operation  is  modified  to  obey  any 
redireetion  rules  that  are  speeified. 

Finally,  we  must  take  steps  to  proteet  Strata  itself.  To  thwart  a  eompromised  applieation 
from  overwriting  Strata’s  own  eode  or  data,  we  use  standard  hardware  memory  proteetion 
meehanisms.  When  exeeuting  the  untrusted  applieation  eode.  Strata  turns  off  read,  write,  and 
exeeute  permission  on  the  pages  of  memory  it  uses,  leaving  only  exeeute  (but  not  write) 
permission  on  the  eode  eaehe.  Strata  also  watehes  for  attempts  by  the  applieation  to  ehange  these 
permissions.  Previous  work  has  shown  this  teehnique  to  be  effeetive  and  that  it  eosts  very 
little  [13]. 


Zipr 

As  noted  previously,  dynamie  rewriting  and  software  dynamie  translation  are  not  appropriate  in 
all  eases.  One  type  of  program  that  eauses  dynamie  rewriting  teehniques  to  have  high  overhead  is 
a  program  that  generates  eode  dynamieally,  sueh  as  Java  JIT  Compiler  or  JavaSeript  engine. 
Dynamie  translation  must  flush  eaehed  eode  whenever  new  eode  is  generated,  whieh  eauses  the 
translation  system  to  do  extra  work.  Typieally  slowdowns  might  be  at  least  3  times  slower  than 
the  untranslated  program.  A  new  addition  to  the  Helix/Kevlar  toolehain  during  the  performanee 
period  was  Zipr,  a  highly  effieient  statie  binary  rewriter. 

Zipr  addresses  many  of  the  problems  suffered  by  existing  statie  binary  rewriters  sueh  as  high 
spaee  overhead  (as  mueh  as  2X),  the  inability  to  translate  arbitrarily  eompiled  programs  (sueh  as 
eode  eompiled  to  be  position  independent),  high  runtime  overhead,  and  requiring  additional 
eompiler  information  whieh  may  not  be  available  [2,  6,  15,  23,  28,  30,  32,  33,  37]. 
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As  part  of  this  contract,  we  created  a  new  type  of  statie  rewriter.  Our  breakthrough 
teehnology,  ealled  Zipr,  is  eapable  of  re-using  existing  eode  segments  and  disambiguating  any 
data  that  may  reside  within  the  eode.  Using  deep  statie  analysis  provided  by  STARS  and  the 
analyses  provided  to  build  the  IRDB,  our  prototype  vastly  outperforms  existing  statie  rewriting. 
Zipr  ean  transform  arbitrary  binaries,  compiled  by  any  compiler,  and  has  modest  runtime  and 
memory  overheads.  The  following  paragraphs  provide  additional  details. 

Our  teehnique  uses  pinned  addresses,  loeations  of  instruetions  in  the  original 
program/library  that  may  be  targeted  indireetly  at  runtime.  Addresses  of  units  of  data  are  always 
pinned.  On  the  other  hand,  the  address,  a,  of  an  instruetion,  i,  in  the  original  program  is  pinned  if 
the  original  program  ealeulates  dynamie  program  eontrol  referenees  to  a  at  runtime.  In  this  ease, 
a  will  be  stored  as  the  pinned  address  value  of  i.  In  other  words,  pinned  address  analysis  depends 
direetly  on  eorreet  ealeulation  of  indireet  eontrol  flow. 

Addresses  of  instruetions  may  be  pinned  for  a  number  of  reasons.  Most  eommonly,  however, 
address  are  pinned  beeause  they  are  the  targets  of  indirect  branches  (IB).  IB  targets  (IBTs)  appear 
in  jump  tables,  immediately  after  eall  instruetions,  the  beginning  of  funetions,  ete.  Just  beeause 
program  eontrol  reaehes  an  instruetion  indireetly  does  not  mean  that  it’s  address  must  be  pinned. 
There  are  eases  where  the  program’s  behavior  with  respeet  to  an  IBT  ean  be  analyzed  and 
modeled  statieally. 

For  our  rewriting  methodology  to  operate  eorreetly  it  is  not  neeessary  to  determine  the  set  of 
possible  targets  for  every  partieular  indireet  braneh  instruetion.  Our  teehnique  relies  only  on  the 
faet  that  P,  the  set  of  all  pinned  addresses,  eontains  at  least  all  the  addresses  of  IBTs  in  the 
original  program.  In  other  words,  we  rely  on  the  ereation  of  P  sueh  that  B  9^  where  B  is  the  set 
containing  the  addresses  of  every  IBT  from  the  original  program. 

It  is  possible  to  ealeulate  P  naively  by  making  the  address  of  every  instruction  of  the  original 
program  a  member.  This  assignment  elearly  satisfies  the  requirement.  As  explained  when 
deseribing  reassembly  sueh  an  assignment  does  not  give  the  reassembly  teehnique  the  flexibility 
to  re-plaee  instruetions.  Moreover,  it  does  not  allow  for  the  ereation  of  an  ejjicient  rewritten 
binary  program. 

Ideally  B=P.  As  \P-B\  grows,  our  method  generates  an  inereasingly  less  spaee-effieient 
rewritten  binary.  Therefore,  our  algorithm  leverages  a  set  of  heuristies  that  analyze  the  original 
program’s  CFG  to  seleet  pinned  addresses.  Again,  it  is  imperative  that  our  technique  be 
eonservative;  missing  a  pinned  address  will  cause  our  rewriting  algorithm  to  generate  a 
transformed  binary  that  does  not  operate  eorreetly. 

For  a  more  detailed  deseription  of  the  algorithms  used  to  identify  pinned  addresses  of 
instruetions  and  data,  see  Hiser  et  al.  [11]  and  Zhang  et  al.  [36].  For  binaries  generated  by  GCC, 
target  eompiler  of  Helix/Kevlar,  our  prototype  implementation,  we  are  able  to  handle  very 
eomplex  programs  ineluding  libraries  such  as  glibc.  Empirical  evidence  suggests  that 
Helix/Kevlar  works  for  programs  generated  by  LLVM  as  well. 
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Pinned  addresses  of  instruetions  play  an  important  role  in  reassembly.  Throughout  the 
rewriting  algorithm,  a  pinned  address,  a,  of  an  instruetion  in  the  original  program  eorresponds  to 
exaetly  one  instruetion,  i.  IR  Construction  assigns  the  original  correspondence  between  a  and  i. 
During  the  Transformation  phase,  one  or  more  transformations  will  change  i  to  i'  and  a  will  still 
correspond  to  i\  For  the  modified  program  to  function  according  to  the  semantics  of  the  original 
program,  as  subsequently  modified  through  user-specified  transformations,  when  the  transformed 
program’s  program  counter  (PC)  reaches  address  a,  instruction  t  must  be  executed.  The 
Reassembly  phase  (described  later)  maintains  this  condition. 
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Figure  5:  Pinned  addresses,  transformations  and  references. 

Figure  5  shows  an  example  of  this  process.  Instruction  i  is  associated  with  pinned  address  a. 
The  Pad  Stack  transformation  modifies  i  so  it  allocates  a  larger  stack.  The  modified  instruction, 
i\  is  still  associated  with  a.  When  i'  is  eventually  placed  at  address  0x3  0A3  in  the  modified 
program,  the  reference  at  a  is  updated  appropriately. 

Zipr’s  Transformation  phase  modifies  the  original  program’s  IR.  User-specified  transforms 
are  optional  transformations  that  modify  or  add/remove  functionality  to/from  the  original 
program.  Mandatory  transforms  make  it  possible  for  the  user-specified  transforms  to  modify  the 
original  program’s  IR  without  regard  for  the  details  of  the  specific  target  platform. 

Mandatory  transformations  in  the  Transformation  phase  produce  a  modified  IR  that  makes  it 
possible  for  the  reassembly  algorithm  to  place  recreated  instructions  arbitrarily  in  the  modified 
program’s  address  space. 
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Mandatory  transformations  most  commonly  address  issues  with  the  target  platform  and  its 
ISA.  For  example,  many  x86  instruetions  can  use  PC-relative  addressing.  The  jump  instruction  is 
one  sueh  instruction. 

Assume  instruction  transfers  control  to  with  a  jump.  On  an  x86,  is  a  jump  to  a.  ,  the 

address  of  z'  .  However,  a.  is  encoded  in  z\  relative  to  a.  .  To  be  able  to  relocate  instruetions, 

2  *2  1 

relationships  like  these  that  rely  on  the  instruetions’  addresses  in  the  original  program  have  to  be 

translated  into  logieal  links.  Fortunately,  the  IR  is  built  using  logical  connections  among 

instructions.  Returning  to  the  example,  the  IR  links  i  to  z  ,  not  a.  .  Memory  operations  (loads 

1  2 

and  stores)  may  also  be  PC-relative. 

Unless  this  situation  is  handled,  PC-relative  instruetions’  placement  in  the  modified  program 
at  different  addresses  will  eause  an  error  during  exeeution  of  the  modified  program.  Eaeh  target 
platform’s  ISA  is  different  and  our  method’s  modular  approaeh  makes  it  possible  for  the  user  to 
apply  as  many  mandatory  transformations  as  neeessary  to  accommodate  the  target  platform. 
Helix/Kevlar  ineludes  all  the  required  mandatory  transformations  for  the  x86  and  x86-64 
platform. 

Once  all  the  mandatory  transformations  are  applied,  our  teehnique  applies  any  user-speeified 
transformations.  These  are  transformations  that  the  user  implements  that  will  modify  the  original 
program.  As  mentioned  earlier,  there  are  many  ways  a  user  eould  modify  the  original  program  to 
improve  its  seeurity,  reliability  and  dependability. 

Instead  of  forcing  the  user  to  ehoose  from  a  set  of  predefined  transformations,  Helix/Kevlar 
provides  the  user  an  API  to  develop  their  own.  The  API  allows  the  user  to  iterate  through  the 
funetions  and  instruetions  of  the  original  program.  Users  ean  ehange  (modify  or  replaee)  or 
remove  instructions.  They  can  even  add  new  instructions  or  specify  how  to  link  in  pre-compiled 
program  code  and  exeeute  funetions  therein. 

At  the  heart  of  our  approaeh’ s  novel  reassembly  teehnique  is  an  algorithm  that  earefully 
reassembles  the  modified  IR  into  a  series  of  instruetions  and  units  of  data  whieh  are  then 
assigned  a  loeation  in  the  modified  program’s  address  spaee. 

The  proeess  begins  by  ereating  references  in  the  modified  program  at  the  pinned  addresses 
from  the  original  program.  These  referenees  target  an  pinned  address’  associated  instruction  or 
unit  of  data,  as  explained  previously.  The  targets  of  those  references  (and  their  fall-through 
instruetions)  are  plaeed  arbitrarily  in  the  remaining  free  space  and  the  referenees  are  marked  as 
resolved.  In  the  proeess  of  resolving  the  initial  set  of  references,  new  unresolved  referenees  may 
be  introduced.  The  targets  of  those  referenees  are  again  placed  arbitrarily  in  the  remaining  free 
spaee  and  the  referenees  are  resolved.  The  proeess  eontinues  until  there  are  no  more  unresolved 
referenees. 
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Figure  6  illustrates  the  internal  state  of  the  algorithm  as  it  reassembles  a  program.  The 
following  subseetions  explain  the  reassembly  algorithm  in  detail. 

At  the  outset,  the  modified  program’s  text  segment  is  empty.  The  data  segment  is  eopied 
direetly  from  the  original  program.  The  reassembly  algorithm  begins  by  plaeing  unresolved 
eonstrained  referenees  at  pinned  addresses. 

A  reference  is  a  link  to  data  or  instructions  in  a  dollop.  A  dollop  is  a  linear  sequence  of 
instructions  linked  by  their  fallthroughs.  References  are  unresolved  when  they  link  to  data  or 
dollops  in  IR  form.  When  a  dollop  is  reconstructed  from  its  IR  into  instructions  and  assigned  a 
location  in  the  modified  program’s  address  space,  references  are  resolved  to  those  particular 
addresses.  In  Figure  6,  r^,  and  r^  are  references,  and  r^  are  resolved  and  r^  is  unresolved. 

A  reference  is  constrained  when  there  is  a  restriction  on  where  its  target  may  be  placed 
within  the  modified  program’s  address  space.  Because  a  reference  includes  an  address  (its 
target),  the  implementation  of  the  reference  itself  must  be  at  least  as  large  as  the  encoding  of  that 
address.  The  size  available  for  encoding  the  address  may  be  limited  when  the  address  of  two 
adjacent  instructions  are  pinned. 

In  Figure  6,  there  is  a  2-byte  instruction  at  0x4  00E'£'  and  a  3-byte  instruction  at 

0x4  0  0  FO  and  both  have  pinned  addresses.  The  reference  to  the  transformed  instruction  zy  will 

have  to  encode  the  instruction’s  address  when  it  is  placed.  No  matter  how  the  ISA  encodes 
addresses  (relative  or  absolute),  the  encoding  cannot  exceed  two  bytes  without  interfering  with 
the  reference  at  the  adjacent  pinned  address.  If  the  ISA  does  not  support  addressing  the  full 
address  space  in  two  bytes,  the  reference  at  0x4  0 OFF  must  be  constrained. 

Once  a  constrained  unresolved  dollop  reference  is  placed  at  each  pinned  address,  the 
reassembly  algorithm  determines  which  references  can  be  unconstrained.  Depending  upon  the 
ISA  of  the  implementation  target,  there  is  a  minimum  size,  s,  necessary  to  store  an  instruction 
that  addresses  the  entire  address  space.  If  the  space  between  adjacent  constrained  unresolved 
references  r ^  and  is  greater  than  s,  our  algorithm  converts  to  an  unconstrained  unresolved 

reference.  In  Figure  6,  would  initially  have  been  a  constrained  unresolved  reference  but  has 

been  converted  to  an  unconstrained  unresolved  reference  because  there  are  no  pinned  addresses 

in  [0x4  000F0,0x4000F0+5). 

For  every  remaining  constrained  unresolved  reference,  r,  that  references  instruction  z,  a  new 
unresolved  unconstrained  dollop  reference,  r\  is  added  in  the  modified  program’s  address  space. 
r  is  resolved  to  r'  through  one  or  more  intermediate  references  and  r'  is  set  to  reference  z.  This  is  a 
process  known  as  chaining  [16].  In  Figure  6,  is  a  reference  to  that  is  chained  through  r^. 

At  this  stage  of  the  reassembly,  all  unresolved  references  are  unconstrained.  Besides  the 
information  from  the  IRDB,  the  reassembly  algorithm  relies  on  three  data  structures: 
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•  uDR  :  The  list  of  unresolved  referenees, 

•  Z);  A  list  of  unplaeed  dollops  (this  list  is  initially  empty), 

•  M:  A  mapping  between  instruetions  and  their  loeation  in  the  modified  program’s  address 
spaee. 

The  final  stage  of  the  reassembly  algorithm  is  iterative:  Every  unresolved  referenee,  r^,  to 
instruetion  i  in  list  uDR  is  eonsidered  in  turn  until  the  list  is  empty. 

Referenee  is  handled  in  one  of  two  ways.  Either  i  is  already  plaeed  in  the  modified 

program’s  address  spaee  or  it  is  not.  In  the  former  ease,  the  reassembly  algorithm  simply  emits  a 
resolved  uneonstrained  referenee  to  M\i\.  is  removed  from  uDR  and  the  loop  eontinues.  The 

latter  ease  is  more  involved  —  a  dollop  eontaining  i  must  be  retrieved  or  eonstrueted  and  then 
plaeed.  The  reassembly  algorithm  searehes  D  for  d,  the  dollop  eontaining  i. 

If  no  dollop  is  found,  the  reassembly  algorithm  eonstruets  a  dollop  that  eontains  i.  The 
dollop  eonstruetion  proeess  is  straightforward.  Dollop  d  begins  with  instruetion  and  ineludes 

’s  fallthrough  z^,  z'^’s  fallthrough  z'^,  and  so  on.  The  last  instruetion  in  d,  i^,  is  the  first  instruetion 

that  has  no  fallthrough.  The  reassembly  algorithm  plaees  the  instruetions  of  d  linearly  in  a 
eonseeutive  bloek  of  addresses.  In  Eigure  6,  d^  is  a  plaeed  dollop. 

When  there  is  no  bloek  of  free  spaee  big  enough  to  aeeommodate  the  instruetions  of  d,  the 
dollop  may  be  split.  Eurthermore,  large  dollops  may  be  split  to  fill  small  bloeks  of  free  spaee. 
Dollop  d  of  instruetions  is  split  by  ehoosing  a  split  point,  z'^.  Dollop  d  is  truneated  to 

eontain  instruetions  {z,  ...z  , }  and  d  is  built  to  eontain  instruetions  jz  ...z  }.  An  uneonstrained 

1  ^  s  n 

unresolved  referenee  r  that  referenees  z'^  is  appended  to  the  end  of  d.  The  unresolved  referenee  r 
is  added  to  uDR  and  d  is  added  to  D. 

After  d  is  plaeed,  is  resolved  and  the  map  M  is  updated  for  all  instruetions  in  d.  Any  other 
unresolved  referenees  that  target  instruetions  in  d  are  resolved  as  well.  In  Eigure  6,  is  resolved 
to  d^. 

An  instruetion  z^  in  the  just-plaeed  dollop  d  may  referenee  another  instruetion,  z^.  If  z^  is 
already  plaeed,  that  referenee  is  resolved  immediately.  Otherwise,  an  unresolved  referenee  is 
ereated  that  referenees  z'^  and  is  added  to  uDR. 

The  modified  program  is  eompletely  reassembled  when  uDR  is  empty. 
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Figure  6:  The  reassembly  algorithm  in  the  process  of  reassembling  a  modified  program. 

Again,  Figure  6  illustrates  these  concepts  in  the  context  of  an  program  being  reassembled.  In 
this  example  we  will  describe  Helix/Kevlar,  the  prototype  implementation  of  our  technique  for 
the  x86  ISA  where  there  are  variable  length  instructions.  A  jump  —  the  implementation  of 
references  for  this  target  —  can  be  as  short  as  two  bytes  (program  control  transfer  is  constrained 
to  nearby  locations)  and  as  long  as  five  (program  control  can  be  transferred  anywhere  in  the 
program). 

In  this  example  there  are  two  pinned  addresses,  two  dollops  and  three  references.  Dollop 
is  already  placed;  dollop  is  not.  Reference  began  as  a  constrained  unresolved  reference  to 
an  instruction  in  dollop  d^  For  expository  purposes,  assume  that  d^  could  not  be  placed  at  an 
address  that  is  addressable  in  2  bytes  from  r^.  Jump  chaining  was  used  and  reference  was 
resolved  to  and  became  an  unresolved  unconstrained  reference  to  the  instruction  in  d^ . 

Because  dollop  d^  is  already  placed,  reference  is  resolved.  Reference  began  as  a 

constrained  reference  but  because  there  were  no  pinned  addresses  in  [0x4  0  0  0  FO, 0x4  0  0  0  F5), 
was  converted  to  an  unconstrained  reference. 


We  have  evaluated  Zipr’s  robustness  and  efficient  using  SPEC2006  on  32-  and  64-bit 
machines.  Zipr  was  able  to  properly  transform  every  application  in  the  benchmark  suite.  All 
quantitative  results  presented  are  from  32-bit  executions  of  SPEC2006  on  a  test  host  with  a  quad- 
core  3.0GHz  CPU  (AMD  Phenom  II  X4  B55)  and  4GB  of  RAM  that  ran  Ubuntu  10.04  ETS  with 
version  4.4.3  of  the  GCC  compiler  suite.  The  baseline  results  for  comparison  were  generated 
with  a  native  run  of  SPEC2006  on  that  same  host.  All  values  presented  in  Eigure  7  are 
normalized  against  those  baseline  results. 

To  quantify  the  overhead  of  the  Helix/Kevlar  technique  itself,  we  compared  the  performance 
and  size  of  the  C-based  SPEC  applications  before  and  after  applying  a  user-specified  Null 
Transformation.  The  Null  Transformation  is  the  most  basic  transformation.  In  fact,  it  is  not  a 
transformation  at  all.  It  is  simply  a  no-op  modification  invoked  on  the  IR  during  the  User- 
specified  Transformation  stage  of  the  Transformation  Phase.  In  other  words,  the  original  and  the 
modified  programs  are  semantically  equivalent.  Therefore,  any  change  in  program  size  (on  disk) 
or  performance  is  a  consequence  of  the  rewriting  algorithm  per  se. 
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Figure  7:  Inherent  overhead  of  programs  transformed  using  Helix/Kevlar  prototype 

implementation. 

The  left  bars  of  Figure  7  shows  the  size  overhead  for  the  C-based  SPEC2006  benchmark 
applications  when  modified  with  the  Null  Transformation.  On  average,  the  modified  binary 
programs  are  4%  larger  than  the  original.  The  right  bars  of  Figure  7  shows  the  performance 
overhead  for  each  of  the  C-based  SPEC2006  benchmark  applications  when  modified  with  the 
Null  Transformation.  On  average,  the  modified  binary  programs  execute  5%  slower  than  the 
original. 

To  reiterate,  although  we  are  presenting  only  the  results  for  the  C-based  SPEC  applications, 
Helix/Kevlar  correctly  transforms  all  of  the  SPEC2006  benchmarks  whether  they  are 
implemented  in  C,  C++  or  Eortran.  While  there  is  ongoing  work  to  improve  the  performance  of 
the  prototype  implementation,  the  overall  results  prohibitively  validate  the  feasibility  and 
inherent  efficiency  of  the  rewriting  technique. 

Rodes  et  al.  [21]  describe  an  SEX  program  transformation  (hereafter  referred  to  as  PI)  that 
“[defends]  binaries  against  intra-frame  stack-based  attacks,  including  overflows  into  local 
variables”  even  without  access  to  the  program’s  source  code.  PI  “[applies]  a  combination  of 
transformations,  including  variable  reordering,  random-sized  padding  between  variables,  and 
placement  of  canaries.” 


PI  was  initially  implemented  using  Strata  using  the  same  API  that  Helix/Kevlar  exposes  to 
its  users  to  rewrite  programs  statically.  Helix/Kevlar  applied  PI  to  a  subset  of  the  C-based 
SPEC2006  benchmark  applications  using  the  very  same  implementation.  Therefore,  the  resulting 
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statically  rewritten  programs  benefitted  from  all  the  seeurity  afforded  by  PI  without  the 
additional  requirement  of  a  runtime  engine  and  without  reimplementation. 
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Figure  8:  Overhead  of  PI, 

The  left  bar  of  Figure  8  shows  that  the  Helix/Kevlar  methodology  ean  add  the  security 
protection  afforded  by  staek  layout  transformation  with  less  than  6%  inerease  in  the  program’s 
on-disk  size,  on  average.  The  right  bar  of  Figure  8  shows  that  adding  the  seeurity  of  PI  through 
the  Helix/Kevlar  methodology  incurs  less  than  a  5%  performanee  penalty.  Our  experiments  show 
GCC’s  built-in  staek  proteetion  meehanism  increases  on-disk  size  by  4%  and  adds  more  than  5% 
to  exeeution  time  on  these  same  benehmarks,  demonstrating  Zipr’s  effieieney. 

3,3  Helix/Kevlar  Protections 

The  Helix/Kevlar  arehiteeture  is  flexible  and  powerful.  It  ean  dynamieally  apply  a  wide  range  of 
diversity  transformations  on  a  running  binary,  it  ean  eheek  and  enforee  various  program 
properties  that  have  been  extraeted  from  the  binary  or  specified  by  an  administrator,  and  it  ean 
insert  remediation  code.  The  following  seetions  describe  some  of  the  Flelix/Kevlar  proteetions. 

3,3,1  Instruction  Location  Transformation  (ILX) 

A  powerful  diversity  teehnique  is  to  randomize  the  loeation  of  eode  so  an  attacker  has  diffieulty 
preeisely  loeating  targets  of  attaek  (e.g.,  entry  point  to  funetions,  tables  of  pointers  to  funetions, 
etc.).  For  example,  most  systems  now  routinely  use  Address  Spaee  Layout  Randomization 
(ASLR)  to  make  exploiting  weaknesses  difficult  [35].  ASLR  has  several  positive  attributes.  It  is 
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cheap  to  apply  incurring  little  or  no  run-time  overhead,  and  it  can  It  can  be  applied  to  any  binary, 
It  is  applied  automatically — no  user  intervention  or  action  is  necessary. 

Unfortunately,  ASLR  implementations  have  low  entropy.  ASLR  on  a  32-bit  architecture 
only  provides  16  bits  of  entropy.  Furthermore,  ASLR  is  not  applied  universally  throughout  the 
address  space.  Even  when  using  dynamically-linked  libraries,  it  is  common  for  the  main  program 
text  to  start  at  a  known  fixed  location.  Because  of  these  limitations,  ASLR-protected  code  is 
subject  to  attack  [5,  22,  27]. 

Instruction  Location  Transformation  (ILX)  is  a  technique  that  seeks  to  scatter  instructions  in 
a  program  randomly  throughout  the  address  space.  In  contrast  to  ASLR,  ILX  provides  3 1  bits  of 
entropy  on  a  32-bit  machine.  Furthermore,  ILX  is  applied  universally  to  all  segments.  Thus,  the 
major  limitations  of  ASLR  are  eliminated. 

Figure  9  conceptually  illustrates  ILX.  The  top  left  of  the  figure  shows  the  control-flow  graph 
of  a  particular  program  segment.  The  compiler  and  the  linker  collaborate  to  produce  an 
executable  file  where  instructions  are  laid  out  so  they  can  be  loaded  into  memory  when  the 
program  is  executed.  A  typical  layout  of  code  is  shown  at  the  bottom  left  of  the  figure.  The  right 
side  of  the  figure  shows  the  layout  of  the  code  when  ILX  is  applied. 


Figure  9:  ILX  code  example 
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To  link  instructions  together,  an  ILX  Sproeket  eontains  a  fallthrough  map  shown  at  the  top 
right  of  Figure  9.  This  map  uses  SPRI  to  speeify  the  exeeution  sueeessor  of  eaeh  instruetion  in 
the  program. 

Together,  we  eall  the  fallthrough  map  and  randomized  instruetion  loeations  an  ILX 
Sproeket.  Figure  10  shows  how  an  ILX  program  eould  be  ereated.  First,  STARS  deteets  the 
instruetions  and  funetions  in  a  program.  Next,  the  program  is  analyzed  for  indireet  braneh  targets, 
eall  sites,  branehes,  ete.  Finally,  the  reassembly  engine  uses  this  information  to  reloeate  the  entire 
program  with  eaeh  instruetion  in  a  randomized  loeation,  and  ereates  the  fallthrough  map. 

To  exeeute  the  randomized  program,  we  use  Strata  to  feteh  and  exeeute  the  instruetions  from 
the  ILX  Sproeket.  Strata  interprets  the  fallthrough  map  to  feteh  and  exeeute  instruetions  on  the 
host  hardware.  Preliminary  results  indieate  that  this  proeess  ean  be  made  very  effieient.  The 
preliminary  prototype  aehieved  only  13%  runtime  overhead  on  the  SPEC2006  benehmark  suite. 
Furthermore,  randomly  seattering  instruetions  throughout  the  address  spaee  signifieantly  reduees 
the  attaek  surfaee  for  mounting  any  are-injeetion  attaeks,  ineluding  attaeks  based  on  return- 
oriented  programming  teehniques.  Hiser  et  al.  provides  a  more  eomplete  diseussion  of  the 
benefits  of  ILX  and  full  details  of  its  implementation  [11]. 


ILX 


Sprocket 


Figure  10:  ILX  Static  Analysis 
3,3,2  Stack  Layout  Transformation  (SLX) 

A  common  target  of  malicious  attacks  are  locations  on  the  stack  (e.g.,  return  addresses,  frame 
pointers,  function  pointers,  and  critical  data).  SLX  is  a  transformation  that  is  applied  to  a  running 
application  to  dynamically  randomize  the  location  of  variables  on  the  stack  and  place  canaries  to 
determine  if  an  attack  has  been  attempted. 
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Transformation  of  the  stack  frame  layout  for  a  function  requires  determination  of: 


1 .  The  current  layout  of  the  stack  frame,  e.g.  the  addresses  and  sizes  of  various  stack  data 
objects  (incoming  arguments,  saved  registers,  return  address,  local  variables,  outgoing 
arguments) 

2.  The  instructions  that  generate  an  address  of  a  data  object  on  the  run-time  stack. 

In  principle,  if  this  information  were  available,  the  layout  of  the  stack  frame  could  be 
changed  and  the  instructions  that  generate  stack  addresses  modified  to  reflect  the  new  layout.  The 
new  layout  of  the  stack  frame  could  be  based  on  any  security-relevant  criteria,  e.g.,  memory 
objects  could  be  placed  in  random  order,  padding  introduced  before,  after  or  within  the  stack, 
canaries  included,  variables  promoted  to  the  heap,  etc.  While  this  information  is  readily  available 
to  the  compiler  when  given  a  program  in  source  code  form,  Helix/Kevlar  must  recover  this 
information  solely  based  on  the  binary  representation. 

In  our  approach,  static  analysis  (STARS)  is  used  to  determine  all  the  necessary  details  of  the 
binary  program.  However,  when  starting  with  a  binary  program,  precise  determination  of  the 
stack  layout  and  the  instructions  that  generate  stack  addresses  for  any  given  function  is 
problematic  (indeed,  even  the  very  basic  notion  of  a  function  is  problematic  at  the  binary  level). 
Modem  compilers  employ  a  wide  range  of  techniques  to  minimize  both  the  use  of  storage  and 
program  execution  time.  The  result  is  binary  programs  with  unpredictable  stmctures. 

Our  approach  to  determination  of  the  stack  layout  and  the  instmctions  that  reference  the 
stack  is  based  on  two  assumptions  about  addressing:  (a)  the  predominant  mechanism  by  which 
instmctions  access  stack  variables  is  through  scaled  or  direct  addressing  based  on  an  offset 
indicating  the  variable  starting  location,  and  (b)  where  indirect  addressing  is  used,  that  use  is  for 
access  to  variables  whose  locations  can  be  inferred  from  previous  direct  or  scaled  addressing. 
Starting  with  these  assumptions,  layout  inferences  are  produced  using  a  set  of  simple  heuristics 
that  rely  upon  additional  assumptions  concerning:  (c)  the  manner  in  which  the  stack  is  allocated 
and  deallocated,  and  (d)  the  general  stack  frame  layout. 

The  assumptions  listed  above  do  not  necessarily  hold  (although  assumptions  (c)  and  (d)  hold 
for  binaries  produced  by  C/C++  compilers  that  use  the  cdecl  x86  calling  convention).  Indeed 
through  BED  and  TSET,  the  Helix/Kevlar  architecture  explicitly  compensates  for  any 
transformations  that  might  rely  on  erroneous  information.  Our  approach  to  stack  layout 
transformation  is  speculative.  Initial  inferences  about  the  stack  are  created,  and  these  inferences 
are  then  evaluated  and  refined  if  necessary  to  ensure  that  they  preserve  the  program’s  semantics. 

In  our  current  approach,  we  limit  transformations  to  placement  of  memory  objects  in  random 
order  and  the  introduction  of  random  length  padding.  Vetting  of  these  transformations  is  by 
testing  with  BED  and  TSET.  Eurthermore,  we  use  diversity  as  an  error  amplification  technique 
to  detect  bad  stack  layout  inferences.  The  basic  idea  behind  error  amplification  is  as  follows:  if  a 
hypothesized  stack  layout  inference  is  correct,  then  any  semantic-preserving  transformations 
should  result  in  a  correct  program  variant.  We  can  therefore  develop  a  multitude  of  such 
transformations,  e.g.,  permutation  of  the  order  of  variables,  and  vet  each  of  the  variants.  If  any  of 
them  fail,  and  assuming  that  our  transformation  is  correctly  implemented,  we  can  then  not  only 
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reject  the  variant  but  also  the  inferred  stack  layout.  Note  that  this  process  is  the  exact  opposite  of 
validating  transformations  by  using  testing  to  validate  optimizing  compiler  transformations. 

For  each  detected  function  in  a  binary,  SLX  randomizes  the  stack  layout  using  an  aggressive 
inference  to  reorder  variables,  e.g.  using  offsets  in  the  disassembly  of  the  program  to  infer 
variables.  If  the  tests  are  passed,  we  use  error  amplification  before  creating  the  final  variant.  The 
layout  is  randomized  a  second  time  and  the  resulting  program  tested  again.  If  the  tests  are  passed 
following  the  second  randomization,  a  third  randomization  is  effected  that  reorders  the  stack 
elements  and  places  padding  between  stack  objects.  If  the  tests  are  passed  with  this 
randomization,  then  the  transformation  is  assumed  to  be  satisfactory,  and  the  analysis  continues 
with  the  next  function. 

If  one  or  more  tests  fail  during  analysis  of  a  function,  the  inference  about  the  stack  layout  is 
abandoned,  and  a  simpler,  less  aggressive  inference  is  used.  The  least  aggressive  inference 
besides  not  changing  the  function  at  all  is  one  in  which  the  entire  stack  frame  is  relocated  but  the 
order  of  variables  is  left  unchanged.  Preliminary  work  suggests  that  reordering  variables  is  an 
effective  error  amplification  technique  as  reordering  misidentified  variables  will  most  likely 
result  in  a  program  crash.  Thus,  three  rounds  of  error  amplification  appears  sufficient  to  vet  SLX 
transformations. 

Our  current  approach  has  been  evaluated  on  a  variety  of  benchmarks,  and  the  results  are 
promising  [21].  The  use  of  BED  and  TSET  resulted  in  binaries  whose  functions  were 
transformed  with  different  levels  of  aggressiveness,  ranging  from  no  transforms,  to  transforms 
that  reordered  a  subset  of  the  local  variables,  and  in  some  cases,  we  were  able  to  infer  and  reorder 
all  local  variables  on  the  stack  frame.  The  ability  to  reorder  stack  variables  for  security  purposes 
is  standard  in  some  compilers,  e.g.,  the  ProPolice  extension  to  gee  reorders  buffers  higher  in 
memory  than  other  variables  to  prevent  local  overflows  [7].  Our  results  demonstrate  that  we  can 
enable  similar  transformations  but  using  only  binaries. 

3,3,3  Heap  Randomization  and  Transformation  (HLX) 

Helix/Kevlar’s  Heap  Eayout  Transformation  (HEX)  provides  protection  against  a  variety  of 
common  memory  errors,  such  as  buffer-overflows,  use-after-free,  and  double-free  errors.  It 
achieves  these  protections  by  detecting  (using  STARS  analysis)  memory  allocations  within  the 
program  and  rewriting  the  allocations  to  randomly  increase  the  allocation  size.  HEX  also  detects 
memory  deallocation  sites  and  maintains  a  pool  of  objects  that  were  marked  as  free.  When 
additional  memory  is  needed  (such  as  the  free  object  pool  has  become  too  large),  the  free  pool  is 
checked  and  objects  are  randomly  selected  for  deallocation. 

Table  1  provides  an  example.  The  left  portion  shows  unprotected  source  code  that  allocates 
a  buffer,  and  uses  that  buffer  to  manipulate  input.  Unfortunately,  the  code  has  a  off-by-one  error, 
and  allocates  too  few  bytes  to  hold  the  newly  formed  string,  perhaps  because  additional 
characters  were  were  added  to  the  manipulation,  but  the  size  of  the  buffer  was  not  updated.  The 
right  side  of  the  figure  shows  how  HEX  would  transform  the  program.  The  amount  of  memory 
allocated  gets  increased  by  a  random  amount,  and  free  pool  management  code  is  inserted.  In  this 
case,  the  off-by-one  error  is  converted  from  a  possibly  crash-inducing  bug,  into  a  fully-correct 
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program.  While  not  all  programs  can  be  completely  repaired,  the  transform  still  prevents  exploits 
because  an  attacker  cannot  reliably  predict  where  heap  items  may  be  located,  or  what  size  a 
buffer  might  be  to  predictably  overrun  the  buffer. 

Table  1:  Example  without  and  with  HEX,  respectively 


int  size  =  strlen (input) ; 

char*  newValue  =  malloc (size+1)  ; 

strcpy (newValue, input)  ; 

sprintf (newValue,  "%s!\n",  input); 

logC'The  input  is  %s",  newValue); 

free (newValue) ; 


int  size  =  strlen (input)  ; 
cleanup_f ree_pool ()  ; 
char*  newValue  = 

malloc (random_increase (size+1) ) ; 
strcpy (newValue, input) ; 
sprintf (newValue,  "%s!\n",  input); 
logC'The  input  is  %s",  newValue); 
add_to_f ree_pool (newValue) ; 


Unlike  the  example,  however,  HLX  provides  high-entropy  randomization  on  a  binary 
program  where  no  source  code  is  available,  like  the  rest  of  Helix/Kevlar. 

3,3,4  Instruction  Set  Randomization  (ISR) 

A  common  and  very  dangerous  form  of  security  attack  involves  exploiting  a  vulnerability  to 
inject  malicious  code  into  an  executing  application  and  then  cause  the  injected  code  to  be 
executed.  A  theoretically-strong  approach  to  defending  against  any  type  of  code-injection  attack 
(irrespective  of  the  vulnerability)  is  to  create  and  use  a  process-specific  instruction  set  that  is 
created  by  a  randomization  algorithm.  Code  injected  by  an  attacker  who  does  not  know  the 
randomization  key  will  be  invalid  for  the  randomized  processor  effectively  thwarting  the  attack. 

Helix/Kevlar  takes  advantage  of  ISR  to  help  defeat  these  kinds  of  attacks.  ISR  uses 
Helix/Kevlar’s  static  analysis  and  runtime  support  to  identify  code  locations,  and  encrypt  them 
during  a  process’  loading  procedure.  Helix/Kevlar  versions  of  ISR  is  based  on  the  first  practical 
version  of  ISR  [12].  Helix/Kevlar  further  extends  this  technology  to  monitor  the  application  for 
dynamically  loaded  code  (shared  objects  or  .dll’s)  and  encrypts  that  code  as  it  enters  the  runtime 
system. 

Helix/Kevlar’s  code-injection  security  can  be  further  enhanced  by  configuring  it  as  a 
metamorphic  shield  [18].  The  metamorphic  shield  (MMS)  technology  not  only  randomizes  the 
code  at  program  startup,  but  periodically  re -randomizes  the  code’s  encryption  characteristics  as 
the  program  is  running.  Such  randomization  prevents  attackers  from  exhaustively  searching  for 
the  encryption  key,  keeping  the  program  safe  from  code  injection  from  even  the  most  determined 
attackers. 
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3,3,5  PC  Confinement  (PCC) 


ISR  provides  diversity  whieh  prevents  malicious  code  from  being  injected  into  the  running 
application,  but  an  attacker  may  still  be  able  to  re-use  code  that  is  already  in  the  application  to 
enact  security  violations,  called  an  arc-injection  attack  [20].  Indirect  branches,  such  as  function 
calls  via  function  pointers  and  function  return  instructions,  are  vulnerable  to  such  an  attack  if  the 
branch’s  data  is  overwritten  with  a  buffer  overflow,  format  string  issue,  or  other  program 
weakness.  Table  2  contains  an  example  of  a  possible  arc-injection  attack. 

Table  2:  Example  of  arc-injection  attack 

void  main ( ) { 

auth  =  authenticate  0 ; 
vulnerable_code () ; 
if  (auth)  { 

send_secret_data () ; 

} 

} 


In  the  table,  if  the  code  in  vulnerable_code  ( )  can  overwrite  the  function’s  return 
address  (or  even  just  part  of  the  function’s  return  address  via  a  partial  overwriting  attack!  [1]), 
the  return  instruction  can  possibly  jump  anywhere  in  the  program.  It  may  jump  to  the  system() 
function  to  execute  shell  commands,  or  to  the  send_secret_data()  call,  to  more  steathily  violate 
the  application’s  security  policy. 

Helix/Kevlar’s  ILX  feature  can  defeat  many  of  these  attacks  by  randomizing  the 
application’s  code.  However,  Strata’s  translation  and  Sprocket  execution  code  are  at  static 
locations,  which  may  still  be  targets  of  arc-injection  attacks.  Helix/Kevlar  can  protect  all 
statically  located  code  by  employing  PC  confinement  (PCC).  PCC  is  a  type  of  program 
shepherding  where  indirect  branches  are  monitored  and  only  allowed  to  transfer  control  to  ILX- 
randomized  code  [14]. 

Helix/Kevlar’s  static  analysis  identifies  the  location  of  valid  indirect  control  transfers,  and 
the  run-time  environment  efficiently  monitors  the  execution  of  indirect  branches.  Indirect  control 
flow  is  only  allowed  if  the  destination  is  acceptable.  Consequently,  PCC  and  ILX  can  disallow 
control  transfers  to  unrandomized  code,  such  as  Strata’s  Sprocket  execution  code,  thereby 
eliminating  the  vast  majority  of  arc-injection  attacks. 

3,4  Technology  Communication 

As  part  of  our  approach  to  transitioning  the  Helix/Kevlar  technology,  we  participated  in  a  variety 
of  meetings,  prepared  various  publications,  and  gave  several  presentations  during  this  period. 

Our  most  significant  and  visible  communications  are: 
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•  At  the  request  of  Patriek  Hurley  of  AFRL,  we  provided  a  demonstration  eopy  of 
Helix/Kevlar  to  members  of  the  Teehnieal  Cooperation  Program  (TTCP).  The  partners 
inelude  Australia,  New  Zealand,  Canada,  United  Kingdom,  and  the  United  States. 

•  We  presented  a  paper.  Security  Protection  of  Binary  Programs,  at  the  10th  lET  System 
Safety  and  Cyber-Seeurity  Conferenee  in  Bristol,  UK  on  Oetober  21,  2015  through 
Oetober  22,  2015. 

•  Jaek  Davidson  presented  a  briefing  on  Helix  teehnology  to  the  United  States  Postal  Serviee 
(USPS)  on  Oetober  16,  2015. 

•  Jaek  Davidson  was  an  invited  partieipant  in  the  Moving  Target  Workshop  held  at  George 
Mason  University  on  August  31,  2015  through  September  1,  2015.  The  title  of  his  talk  was 
"Evaluating  the  Effeetiveness  of  the  Helix’s  Metamorphie  Shield." 

•  Anh  Nguyen-Tuong  met  with  Azbil,  a  maker  of  industrial  automation  and  eontrol  produets, 
to  diseuss  the  use  of  Helix/Kevlar  to  proteet  industrial  systems. 

•  Jaek  Davidson  made  a  presentation  on  eyber  seeurity  researeh  to  the  Department  of 
Computer  Seienee’s  Industrial  Advisory  Board  on  July  16,  2015.  Representatives  of 
Capital  One,  Palantir,  Appian,  Eoekheed  Martin,  and  Exeella  were  in  attendanee. 

•  We  presented  a  paper,  Joza:  Pfybrid  Taint  Inference  for  Defeating  Web  Application  SQL 
Injection  Attacks,  at  the  45th  Annual  lEEE/IEIP  International  Conferenee  on  Dependable 
Systems  and  Networks  in  Rio  de  Janeiro,  Brazil  on  June  22-25,  2015. 

•  Jaek  Davidson  presented  a  teehnology  briefing  to  Northrop  Grumman  on  May  28,  2015. 

•  Anh  Nguyen-Tuong  Airbus  presented  a  briefing  on  Helix  teehnology  to  Airbus  eyber 
seeurity  group  on  Oetober  19,  2015. 

4,0  RESULTS  AND  DISCUSSION 

We  present  our  results  (as  indieated  in  the  following  subseetions)  for  eaeh  reporting  period 
during  the  projeet. 

4.1  Period  1:  19-FEB-2015  through  30-JUN-2015 

4.1.1  Progress  Against  Planned  Ohjectives 

A  major  objeetive  of  this  effort  is  to  demonstrate  the  teehnieal  readiness  of  portions  of  the  Helix 
teehnology  developed  under  eontraet  FA8650-10-C-7025,  FA8650-13-2-0096,  and  others. 
Towards  this  goal,  we  provided  Northrop  Grumman  with  a  virtual  maehine  image  that  eontained 
the  eomplete  Helix  tool  ehain,  direetions  for  applying  the  tools  to  applieations,  and  sample 
programs  that  had  already  been  proteeted.  We  worked  with  the  Northop  Grumman  personnel 
(Russ  Hall  and  Kevin  Reynolds)  to  enable  Northrop  Grumman  to  perform  red  team  attaeks 
against  proteeted  applieations  to  demonstrate  the  effeetiveness  of  the  Helix  proteetions.  We 
expeet  the  results  of  this  exereise  in  the  near  future  (i.e.,  a  few  weeks). 
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4.1.2  Technical  Accomplishments  this  Period 

There  were  several  major  teehnieal  accomplishments  this  period.  First,  we  completed  the 
retargeting  of  Strata,  our  software  dynamic  translator,  to  Windows  (64-bit).  Strata  is  a  key 
component  of  Helix  in  that  it  provides  the  capability  to  dynamically  translate  code.  This 
capability  provides  the  ability  to  apply  various  diversity  transformations  and  to  shift  the  attack 
surface  if  desired.  Akshay  Joshi  (as  j  5b@  Virginia  .  edu)  is  doing  this  work  as  part  of  his 
Ph.D.  research. 

A  second  major  technical  accomplishment  was  to  use  Zipr,  our  static  binary  rewriting 
technology,  to  rewrite,  diversify  and  protect  several  interpreters/ JIT  such  as  the  main  executable 
of  Java  VM,  Python,  and  Node  .  j  s,  the  Javascript  runtime). 

A  third  major  technical  accomplishment  was  to  demonstrate  that  Zipr  was  able  to  apply  and 
compose  diversity  transformations  (i.e.,  block-level  location  randomization  and  stack  padding)  to 
a  small  JIT  program. 

A  fourth  major  accomplishment  was  a  demonstration  that  Zipr  with  a  moving  target  defense 
could  defeat  blind  ROP  attacks.  A  paper  is  being  written  for  submission  to  a  major  security 
conference.  Will  Hawkins  (whh8b@virginia  .  edu)  is  doing  this  work  as  part  of  his  Ph.D. 
research. 

4.1.3  Improvements  to  Prototypes  This  Period 

We  enhanced  Zipr  to  handle  the  main  executable  of  the  Java  VM  and  other  interpreters  such  as 
Python  and  Javascript. 

We  continued  to  fix  both  software  errors  and  performance  and  scalability  issues  throughout 
the  Helix  tool  chain  (e.g.,  STARS,  Strata,  Zipr,  etc.) 

4,2  Period  2:  Ol-JUL-2015  through  15-AUG-2015 

4,2,1  Progress  Against  Planned  Objectives 

A  major  objective  of  this  effort  is  to  demonstrate  the  technical  readiness  of  portions  of  the  Helix 
technology  developed  under  contract  FA8650-10-C-7025,  FA8650-13-2-0096,  and  others. 
Towards  this  end,  we  have  been  improving  our  technology  to  work  on  the  Java  VM  (JVM)  and 
the  Windows  7  platform. 

Our  focus  with  Java  has  been  on  the  Ubuntu  platform,  with  ultimate  goals  of  using  this  for 
GCCS  under  Solaris.  To  achieve  our  goal,  we’ve  been  using  Zephyr’s  static  binary  rewriter,  Zipr, 
to  effect  changes  in  binary  programs.  Zipr’s  main  benefit  is  that  it  is  interoperable  with  dynamic 
code  generation  (or  just  in  time  compilation  also  known  as  JITting).  We  have  block-level  ILR 
working  on  the  JVM  core  components,  and  initial  prototypes  working  on  Solaris  with  smaller 
programs. 
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On  Windows,  our  core  analysis  and  dynamic  transformation  engines  are  working  with  all 
major  program  features  (such  as  threads  and  exception  handling). 

4.2.2  Technical  Accomplishments  this  Period 

There  were  several  major  technical  accomplishments  this  period.  First,  we  enhanced  Strata,  our 
software  dynamic  translator,  to  Windows  (64-bit).  Strata  is  a  key  component  of  Helix  in  that  it 
provides  the  capability  to  dynamically  translate  code.  This  capability  provides  the  ability  to  apply 
various  diversity  transformations  and  to  shift  the  attack  surface  if  desired.  Akshay  Joshi 
(as  j  5b@virginia  ,  edu)  is  doing  this  work  as  part  of  his  Ph.D.  research.  Another  student, 
Jian  Xiang  (jx5c@virginia.edu),  is  working  on  benchmarking  Strata  under  Windows  to  identify 
and  fix  performance  issues. 

A  second  major  technical  accomplishment  was  to  use  Zipr,  our  static  binary  rewriting 
technology,  to  rewrite,  diversify  the  core  shared  libraries  of  the  Java  virtual  machine.  A  major 
component  of  the  JVM  is  libj  li  ,  so,  and  we  have  achieved  block-level  ILR  protection. 
Block-level  ILR  relocates  all  of  the  basic  blocks  in  a  program  so  that  each  machine  can  have  a 
randomized  binary  to  help  break  the  software  monoculture.  This  diversity  transform  helps  defend 
against  attacks  that  rely  on  knowing  the  location  of  key  code  sections.  To  ensure  that 
performance  overheads  are  acceptable,  we  have  been  working  on  benchmarking  Zipr  on  Ubuntu 
with  the  help  of  a  student.  Will  Hawkins  (whh8b@virginia  ,  edu). 

We  have  further  improved  our  development  infrastructure  so  that  Ubuntu,  Windows,  and 
Solaris  versions  all  integrated  into  one  version  of  the  source.  This  integration  allows  bug  fixes  to 
immediately  be  applied  to  all  versions  of  the  software.  Further  our  testing  infrastructure  now 
includes  nightly  tests  for  Zipr,  and  we  are  working  to  include  Windows  and  Solaris  testing 
nightly. 

4.2.3  Improvements  to  Prototypes  This  Period 

On  Ubuntu  platforms,  we  enhanced  Zipr  to  handle  shared  objects,  including  the  core  shared 
objects  of  the  Java  VM.  The  core  of  the  JVM  is  a  12MB  shared  object  called  libj  li  ,  so. 

Libjli  makes  up  over  90%  of  the  JVM's  functionality.  We  have  tested  the  JVM  against  jedit,  a 
production-quality,  GUI-based,  full-featured  text  editor.  Full  functionality  is  retained.  Initial 
work  on  Zipr  for  Solaris  is  promising.  We  have  been  able  to  rewrite  small  executables 
successfully. 

We  further  continued  development  on  the  Windows  platform.  The  dynamic  translation 
execution  engine  now  supports  all  major  features  of  Windows  executables  (e.g.,  threads, 
exception  handling,  signal  delivery,  etc.)  Further,  the  analysis  engine  is  now  capable  of 
analyzing  Windows  PE  files,  creating  the  IRDB  representation  of  the  program,  and  producing 
rewrite  results  that  can  be  used  by  either  static  or  dynamic  binary  rewriters. 

We  continued  to  fix  both  software  errors  and  performance  and  scalability  issues  throughout 
the  Helix/Kevlar  tool  chain  (e.g.,  STARS,  Strata,  Zipr,  etc.) 
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4.3 


Period  3:  16-AUG-2015  through  30-SEP-2015 


4.3.1  Progress  Against  Planned  Objectives 

A  major  objective  of  this  effort  is  to  demonstrate  the  technical  readiness  of  portions  of  the  Helix 
technology  developed  under  contract  FA8650-10-C-7025,  FA8650-13-2-0096  and  others.  During 
this  period,  we  focused  on  robustness  of  our  static  analysis  and  binary  rewriting  technology. 

4.3.2  Technical  Accomplishments  this  Period 

We  leveraged  a  Red  Team  exercise  performed  by  Raytheon  for  another  project  to  test 
Helix/Kevlar’s  ability  to  process  the  Global  Positioning  Navigation  and  Timing  Systems 
(GPNTS)  being  developed  by  Raytheon  for  the  Navy. 

The  major  technical  accomplishments  this  period  are: 

•  Retargeted  Helix/Kevlar  to  RedHat  Enterprise  Linux. 

•  Refactored  the  STARS  interfaces  so  that  analysis  could  be  done  on  transformed  binaries. 
This  change  enables  easy  composition  of  defenses. 

•  Worked  on  moving  Helix/Kevlar  to  the  Cloud  to  enable  users  to  apply  Helix/Kevlar 
protections  through  an  easy-to-use  web  interface.  It  also  allows  the  use  of  high-end, 
scalable  computing  resources  for  various  tasks. 

4.3.3  Improvements  to  Prototypes  This  Period 

We  have  made  numerous  improvements  to  the  Helix/Kevlar  toolchain,  in  particular  for  x86-64. 
These  include: 

•  Ability  to  run  Helix/Kevlar  on  Redhat  Enterprise  Linux  (RHEL)  and  process  RHEL 
binaries. 

•  Eixed  numerous  bugs  in  STARS  that  were  exposed  through  the  Raytheon  Red  Team 
exercise. 

In  addition  we  continue  to  fix  bugs  we  encounter  as  part  of  the  porting  process  to  x86-64. 

The  net  result  is  to  vastly  improve  the  robustness  of  Kevlar  across  both  the  x86-32  and  the  x86- 
64  bit  versions. 

4.4  Results  Discussion 

During  the  period  of  the  project,  we  have  done  much  to  understand  and  promote  the  possible 
transition  of  Helix/Kevlar.  It  is  important  to  note  that  Helix/Kevlar  is  playing  a  key  role  in  two 
new  efforts.  Eirst,  Helix/Kevlar  is  a  key  component  of  our  entry  in  DARPA’s  Cyber  Grand 
Challenge.  We  were  one  of  seven  (out  of  104)  teams  that  have  advanced  to  the  finals  to  be  held  at 
Def  Con  24  to  be  held  in  Las  Vegas  August  4-7,  2016. 
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Second,  Helix/Kevlar  is  also  a  key  component  of  our  effort  within  DARPA’s  Cyber  Fault- 
tolerant  Attack  Recovery  (CFAR)  program.  This  project  seeks  to  build  A-variant  systems  that  are 
provable  able  to  withstand  attack.  Helix/Kevlar’s  diversity  transforms  are  heavily  used. 

Third,  various  meetings,  publications  and  presentations  have  helped  us  connect  with  possible 
transition  parters  and  taught  us  the  constraints  customers  may  have  for  a  transitionable 
technology.  Our  porting  to  Windows,  Solaris  and  RedHat  platforms  has  helped  us  both  gain  an 
deeper  understanding  these  issues  and  constraints,  as  well  as  made  our  prototype  technology 
more  attractive  to  potential  parters.  We  have  learned  much,  namely  that  different  operating 
systems  have  different  default  compilers,  which  can  provide  significant  challenges  to  a  tool  that 
works  on  the  compiler’s  output  (a  binary  program). 

5.0  CONCLUSIONS 

Security  weaknesses  in  DoD  information  systems  remain  a  major  challenge  for  system 
stakeholders.  We  have  advanced  the  transition  of  technology  developed  under  the  Helix  and 
PEASOUP  projects  to  protect  Air  Force  systems  of  interests.  The  result  are  expected  to  be  an 
asset  that,  if  widely  deployed  by  the  DoD,  would  enable  a  high  level  of  confidence  in  the  security 
of  DoD  systems,  in  particular,  confidence  that  certain  classes  of  critical  vulnerabilities  were  no 
longer  subject  to  possible  exploitation. 

We  have  leveraged  the  opportunity  to  take  the  Helix  architecture  one  step  closer  to 
deployment  in  real  systems  by  developing  Helix/Kevlar,  a  completely  automatic  system  for 
securing  applications  against  attack  by  well-funded,  determined  malicious  adversaries. 
Helix/Kevlar  armors  binary  programs  and  protects  them  from  attacks  which  could  arise  from  the 
inevitable  vulnerabilities  that  remain  after  deployment.  The  source  code  is  not  required  nor  are 
any  other  development  artifacts.  These  features  make  Helix/Kevlar  of  particular  value  for 
software  systems  that  have  to  be  used  but  for  which  no  development  information  is  available. 

During  this  project  we  have  done  much  to  understand  and  promote  the  possible  transition  of 
Helix/Kevlar.  First,  various  meetings,  publications  and  presentations  have  helped  us  connect  with 
possible  transition  partners  and  taught  us  the  constraints  customers  may  have  for  a  transitionable 
technology.  Our  porting  to  Windows  and  Solaris  platforms  has  helped  us  both  gain  a  deeper 
understanding  these  issues  and  constraints,  as  well  as  made  our  prototype  technology  more 
attractive  to  potential  partners.  We  have  learned  much,  namely  that  different  operating  systems 
have  different  default  compilers,  which  can  provide  significant  challenges  to  a  tool  that  works  on 
the  compiler’s  output  (a  binary  program).  In  particular,  some  of  our  Linux -based  tools  assumed  a 
particular  calling  convention,  and  different  systems  use  different  calling  conventions.  Abstracting 
the  calling  convention  as  much  as  possible  eases  technology  transition. 
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LIST  OF  SYMBOLS,  ABREVIATIONS  AND  ACRONYMS 


DoD  Department  of  Defense 

PEASOUP  Preventing  Exploits  Against  Software  of  Uncertain  Provenance 
DARPA  Defense  Advanced  Research  Projects  Agency 

lARPA  Intelligence  Advanced  Research  Projects  Agency 
NSE  National  Science  Eoundation 

VM  Virtual  Machine 

lET  Institution  of  Engineering  and  Technology 

IEEE  Institute  of  Electrical  and  Electronics  Engineers 

ROP  Return-oriented  programming 

lEIP  International  Eederation  for  Information  Processing 

IRDB  Intermediate  Representation  Database 

SDT  software  dynamic  translator 

TRL  Technical  Readiness  Eevel 


STARS  STatic  Analysis  of  Reactive  Systems 

BED  Behavior  Equivalence  Detection 
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TSET 

Test  Suite  Evaluation  Teehnology 

IR 

Intermediate  Representation 

SSA 

Statie  Single  Assignment 

SPRI 

Sproeket  Program  Rewriting  Interfaee 

ILX 

Instruetion  Loeation  Transformation 

API 

Applieation  Program  Interfaee 

PC 

Program  Counter 

IB 

Indireet  Branehes 

IBT 

IB  Target 

LEV 

Low  Level  Virtual  Maehine 

RAM 

Random  Aeeess  Memory 

CPU 

Central  Proeessing  Unit 

GB 

Gigabyte 

SEX 

Staek  Layout  Transformation 

ILX 

Instruetion  Loeation  Transformation 

ASLR 

Address  Spaee  Layout  Randomization 

BED 

Behavior  Equivalenee  Deteetion 

HEX 

Heap  Layout  Transformation 

PCC 

PC  Confinement 

TTCP 

Teehnieal  Cooperation  Program 

USPS 

United  States  Postal  Serviee 

GCCS 

Global  Command  and  Control  System 

JVM 

Java  Virtual  Maehine 

GUI 

Graphieal  User  Interfaee 

GPNTS 

Global  Positioning  Navigation  and  Timing  Systems 

RHEL 

Redhat  Enterprise  Linux 
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